Statistic-based isolation of lethargic drives

ABSTRACT

A method is provided for identifying a lethargic drive. The method includes executing a command directed to at least two drives in a redundant array of independent disks (RAID) configuration. Each of the drives of the at least two drives is associated with a plurality of timing buckets. The method also includes determining a completion time of the command, and, for each of the at least two drives that the command was directed to, counting the completion time of the command in one of the timing buckets associated with the drive.

BACKGROUND

The present invention relates to storage arrays, and more specifically,this invention relates to identifying a lethargic drive within a storagearray.

RAID (redundant array of independent disks) is a data storage approachthat combines multiple storage drives into a single logical unit forpurposes of increasing performance and/or data reliability. However,when a single drive in the array is slower than the others, it maynegatively affect the performance of the whole array.

Modern RAID hardware receives a chain of operations that directs thestorage and/or retrieval from individual drives of an array. When achain of operations takes longer to complete than expected, it may notbe possible to observe and time the operations directed to individualdrives and determine whether a specific drive is responsible for thedelay. Further complexity may arise when performance assists allowoperations to be chained for update writes. Any drive in the chain maybe the cause of the overall slow I/O.

Accordingly, isolating a lethargic drive (i.e., a drive that has aresponse time above an established norm) in a RAID configuration may beproblematic. While drives that cause timeouts are easily identifiedbecause the failed command indicates the failing drive, a lethargicdrive may be hidden by the RAID configuration.

In some systems, performance statistics may be accumulated for an arrayas a whole, so individual drive data is lost. Further, average responsetime data may be available, but lethargic drive responses may beconcealed in an average response time view.

BRIEF SUMMARY

In one general embodiment, a method includes executing a commanddirected to at least two drives in a redundant array of independentdisks (RAID) configuration, where each of the drives of the at least twodrives is associated with a plurality of timing buckets. The method alsoincludes determining a completion time of the command. Additionally, themethod includes, for each of the at least two drives that the commandwas directed to, counting the completion time of the command in one ofthe timing buckets associated with the drive.

In another general embodiment, a computer program product is providedfor identifying a lethargic drive. The computer program productcomprises a computer readable storage medium having program instructionsembodied therewith. The program instructions are executable by aprocessor to cause the processor to execute a command directed to atleast two drives in a redundant array of independent disks (RAID)configuration, where each of the drives of the at least two drives isassociated with a plurality of timing buckets. The program instructionsare also executable by the processor to cause the processor to determinea completion time of the command, and, for each of the at least twodrives that the command was directed to, count the completion time ofthe command in one of the timing buckets associated with the drive.

In another general embodiment, a system comprises a processor and logicintegrated with and/or executable by the processor. The logic isconfigured to execute a command directed to at least two drives in aredundant array of independent disks (RAID) configuration, where each ofthe drives of the at least two drives is associated with a plurality oftiming buckets. The logic is further configured to determine acompletion time of the command, and for each of the at least two drivesthat the command was directed to, count the completion time of thecommand in one of the timing buckets associated with the drive.

Other aspects and embodiments of the present invention will becomeapparent from the following detailed description, which, when taken inconjunction with the drawings, illustrate by way of example theprinciples of the invention.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 illustrates a network architecture, in accordance with oneembodiment.

FIG. 2 shows a representative hardware environment that may beassociated with the servers and/or clients of FIG. 1, in accordance withone embodiment.

FIG. 3 illustrates a method for isolating a lethargic drive, inaccordance with one embodiment.

FIG. 4 illustrates commands executing against a RAID configuration, inaccordance with another embodiment.

DETAILED DESCRIPTION

The following description is made for the purpose of illustrating thegeneral principles of the present invention and is not meant to limitthe inventive concepts claimed herein. Further, particular featuresdescribed herein can be used in combination with other describedfeatures in each of the various possible combinations and permutations.

Unless otherwise specifically defined herein, all terms are to be giventheir broadest possible interpretation including meanings implied fromthe specification as well as meanings understood by those skilled in theart and/or as defined in dictionaries, treatises, etc.

It must also be noted that, as used in the specification and theappended claims, the singular forms “a,” “an” and “the” include pluralreferents unless otherwise specified. It will be further understood thatthe terms “comprises” and/or “comprising,” when used in thisspecification, specify the presence of stated features, integers, steps,operations, elements, and/or components, but do not preclude thepresence or addition of one or more other features, integers, steps,operations, elements, components, and/or groups thereof.

The following description discloses several preferred embodiments ofsystems, methods and computer program products for isolating a lethargicdrive in a RAID configuration.

In one general embodiment, a method includes executing a commanddirected to at least two drives in a redundant array of independentdisks (RAID) configuration, where each of the drives of the at least twodrives is associated with a plurality of timing buckets. The method alsoincludes determining a completion time of the command. Additionally, themethod includes, for each of the at least two drives that the commandwas directed to, counting the completion time of the command in one ofthe timing buckets associated with the drive.

In another general embodiment, a computer program product is providedfor identifying a lethargic drive. The computer program productcomprises a computer readable storage medium having program instructionsembodied therewith. The program instructions are executable by aprocessor to cause the processor to execute a command directed to atleast two drives in a redundant array of independent disks (RAID)configuration, where each of the drives of the at least two drives isassociated with a plurality of timing buckets. The program instructionsare also executable by the processor to cause the processor to determinea completion time of the command, and, for each of the at least twodrives that the command was directed to, count the completion time ofthe command in one of the timing buckets associated with the drive.

In another general embodiment, a system comprises a processor and logicintegrated with and/or executable by the processor. The logic isconfigured to execute a command directed to at least two drives in aredundant array of independent disks (RAID) configuration, where each ofthe drives of the at least two drives is associated with a plurality oftiming buckets. The logic is further configured to determine acompletion time of the command, and for each of the at least two drivesthat the command was directed to, count the completion time of thecommand in one of the timing buckets associated with the drive.

FIG. 1 illustrates an architecture 100, in accordance with oneembodiment. As shown in FIG. 1, a plurality of remote networks 102 areprovided including a first remote network 104 and a second remotenetwork 106. A gateway 101 may be coupled between the remote networks102 and a proximate network 108. In the context of the presentarchitecture 100, the networks 104, 106 may each take any formincluding, but not limited to a LAN, a WAN such as the Internet, publicswitched telephone network (PSTN), internal telephone network, etc.

In use, the gateway 101 serves as an entrance point from the remotenetworks 102 to the proximate network 108. As such, the gateway 101 mayfunction as a router, which is capable of directing a given packet ofdata that arrives at the gateway 101, and a switch, which furnishes theactual path in and out of the gateway 101 for a given packet.

Further included is at least one data server 114 coupled to theproximate network 108, and which is accessible from the remote networks102 via the gateway 101. It should be noted that the data server(s) 114may include any type of computing device/groupware. Coupled to each dataserver 114 is a plurality of user devices 116. User devices 116 may alsobe connected directly through one of the networks 104, 106, 108. Suchuser devices 116 may include a desktop computer, lap-top computer,hand-held computer, printer or any other type of logic. It should benoted that a user device 111 may also be directly coupled to any of thenetworks, in one embodiment.

A peripheral 120 or series of peripherals 120, e.g., facsimile machines,printers, networked and/or local storage units or systems, etc., may becoupled to one or more of the networks 104, 106, 108. It should be notedthat databases and/or additional components may be utilized with, orintegrated into, any type of network element coupled to the networks104, 106, 108. In the context of the present description, a networkelement may refer to any component of a network.

According to some approaches, methods and systems described herein maybe implemented with and/or on virtual systems and/or systems whichemulate one or more other systems, such as a UNIX system which emulatesan IBM z/OS environment, a UNIX system which virtually hosts a MICROSOFTWINDOWS environment, a MICROSOFT WINDOWS system which emulates an IBMz/OS environment, etc. This virtualization and/or emulation may beenhanced through the use of VMWARE software, in some embodiments.

In more approaches, one or more networks 104, 106, 108, may represent acluster of systems commonly referred to as a “cloud.” In cloudcomputing, shared resources, such as processing power, peripherals,software, data, servers, etc., are provided to any system in the cloudin an on-demand relationship, thereby allowing access and distributionof services across many computing systems. Cloud computing typicallyinvolves an Internet connection between the systems operating in thecloud, but other techniques of connecting the systems may also be used.

FIG. 2 shows a representative hardware environment associated with auser device 116 and/or server 114 of FIG. 1, in accordance with oneembodiment. Such figure illustrates a typical hardware configuration ofa workstation having a central processing unit 210, such as amicroprocessor, and a number of other units interconnected via a systembus 212.

The workstation shown in FIG. 2 includes a Random Access Memory (RAM)214, Read Only Memory (ROM) 216, an I/O adapter 218 for connectingperipheral devices such as disk storage units 220 to the bus 212, a userinterface adapter 222 for connecting a keyboard 224, a mouse 226, aspeaker 228, a microphone 232, and/or other user interface devices suchas a touch screen and a digital camera (not shown) to the bus 212,communication adapter 234 for connecting the workstation to acommunication network 235 (e.g., a data processing network) and adisplay adapter 236 for connecting the bus 212 to a display device 238.

The workstation may have resident thereon an operating system such asthe Microsoft Windows® Operating System (OS), a MAC OS, a UNIX OS, etc.It will be appreciated that a preferred embodiment may also beimplemented on platforms and operating systems other than thosementioned. A preferred embodiment may be written using XML, C, and/orC++ language, or other programming languages, along with an objectoriented programming methodology. Object oriented programming (OOP),which has become increasingly used to develop complex applications, maybe used.

Now referring to FIG. 3, a flowchart of a method 300 for isolating alethargic drive, is shown according to one embodiment. The method 300may be performed in accordance with the present invention in any of theenvironments depicted in FIGS. 1-2, among others, in variousembodiments. Of course, more or less operations than those specificallydescribed in FIG. 3 may be included in method 300, as would beunderstood by one of skill in the art upon reading the presentdescriptions.

Each of the steps of the method 300 may be performed by any suitablecomponent of the operating environment. For example, in variousembodiments, the method 300 may be partially or entirely performed by aRAID controller, or some other device having one or more processorstherein. The processor, e.g., processing circuit(s), chip(s), and/ormodule(s) implemented in hardware and/or software, and preferably havingat least one hardware component may be utilized in any device to performone or more steps of the method 300. Illustrative processors include,but are not limited to, a central processing unit (CPU), an applicationspecific integrated circuit (ASIC), a field programmable gate array(FPGA), etc., combinations thereof, or any other suitable computingdevice known in the art.

As shown in FIG. 3, method 300 may initiate with operation 302, where acommand is executed. The executed command is directed to at least twodrives in a RAID configuration. As noted below, the drives may be anytype of storage device usable in a RAID configuration, e.g., solid statedrives, hard disk drives, etc.

As used herein, a command may include any instruction to a device tocause the device to perform a specific task. Further, the command may bedirected to a drive when it causes any I/O operation (e.g., a read, awrite, a modify, etc.) on the drive. Thus, the command may be directedto the at least two drives of the RAID configuration when the commandresults in any I/O operations on the at least two drives. In oneembodiment, the command may include instructions that cause reads and/orwrites to one or more drives of the RAID configuration. The reads and/orwrites to the one or more drives may include updates to parityinformation, such as parity blocks.

In one particular embodiment, the command directed to the at least twodrives includes read operations that result in reading data blocks fromtwo or more drives. In another embodiment, the command directed to theat least two drives includes a write operation that results in writingdata blocks of one or more drives, and updating a parity block on one ormore drives. For example, the command directed to the at least twodrives may include a write operation to a data block of a first drive,and updating a parity block of a second drive. The parity block maytrack parity for all data blocks of a given stripe across drives of thearray.

In some embodiments, the command directed to the at least two drives ofthe RAID configuration may include a chain of operations. For example,the chain of operations in the command may include: a first readoperation to read one or more data blocks of a first drive, a secondread operation to read one or more data blocks of a second drive, an XORoperation of the data blocks of the first drive and the data blocks ofthe second drive, and then writing the results of the XOR operation to athird drive.

In various embodiments, a drive includes any physical unit of storagethat is capable of being combined with other physical units of storageto comprise a single logical unit. For example, the drives may includehard disk drives (HDDs), solid state drives (SSDs), volatile memory,non-volatile memory, etc. Accordingly, in various embodiments, the RAIDconfiguration may include a single logical unit comprised of anassociation of multiple physical units. The single logical unit may beconstructed for purposes of data redundancy and/or improving accessperformance.

Still yet, as shown at operation 302, each of the drives is associatedwith a plurality of timing buckets. As used herein, a timing bucketincludes any logical construct capable of counting or tracking aninteger. Additionally, each of the timing buckets of a drive may beassociated with a different time interval. Further, all of the timeintervals may be constructed with reference to a single baseline. Forexample, for a given drive, there may be a number different timingbuckets associated with the drive. Additionally, each drive of the atleast two drives may be associated with its own set of the number ofdifferent timing buckets.

In one particular example, for a given drive, there may be four timingbuckets associated with the drive. The four timing buckets may include afirst bucket associated with a time of >100 milliseconds (ms), a secondbucket associated with a time of >250 ms, a third bucket associated witha time of >500 ms, and a fourth bucket associated with a time of >1 s.Further, each of the drives in the RAID configuration is associated withits own set of timing buckets. For example, for a given RAIDconfiguration of three drives, there may be three >100 ms buckets, whereeach of the >100 ms buckets is associated with a respective drive.

Of course, such timing buckets are presented for the purposes ofsimplifying the present discussion, and it is contemplated that anynumber of timing buckets greater or less than four may be used for eachdrive.

Similarly, it is contemplated that the timing buckets may be associatedwith time intervals other than those presented above. For example, thetiming buckets may be based on a 100 ms interval (e.g., >100 ms, >200ms, >300 ms, >400 ms, >500 ms, >600 ms, . . . etc.), or may be based ona 200 ms interval (e.g., >200 ms, >400 ms, >600 ms >800 ms, . . . etc.),or any other timing interval.

In some embodiments, each of the time intervals may be considered to bea slow command time. For example in a system employing the four timingbuckets of >100 ms, >250 ms, >500 ms, and >1 s, any command that exceeds100 ms to complete execution may be considered a slow command.

Next, at operation 304, a completion time of the command is determined.The completion time of the command may include any calculated ormeasured amount of time that elapsed during performance of the command.In one embodiment, the completion of the time may include an elapsedtime from receipt of the command to completion of execution of thecommand. In another embodiment, the completion time may include anelapsed time from beginning execution of the command to completion ofexecution of the command.

Further, at operation 306, for each of the at least two drives that thecommand was directed to, the completion time of the command is countedin one of the timing buckets associated with the drive. Counting thecompletion time in the one of the timing buckets associated with thedrive may include incrementing a counter of the timing bucket. Forexample, if a command is directed to three drives of an array, then thecompletion time of the command is counted in three timing buckets—acorresponding timing bucket for each of the three drives. Counting thecompletion time of the command may include incrementing a counterassociated with each of the three buckets.

As noted above, a chain of operations may result in reads and writes toa group of drives. Further, more than one of the drives may be involvedin any particular operation. While an aggregate completion time of thecommand may be determined, in the case of a delay in execution of thecommand there may be no information available to indicate which of thedrives is responsible for the delay.

However, it may be known which drives of the array were accessed duringexecution of a given command. By counting the completion time in timingbuckets of every resource that might've been involved in execution ofthe command, then, over time, a lethargic drive may be identified as thedrive with the highest timing bucket counts.

For a given command, the completion time of the command may be countedagainst the highest applicable bucket for each of the drives. As aspecific example, in a system that maintains >100 ms, >250 ms, >500 ms,and >1 s timing buckets for each of the drives of a four-drive array, ifa command is directed to three of the drives, and a completion time of acommand is timed at 600 ms, then a >500 ms timing bucket of the firstdrive would be incremented, a >500 ms timing bucket of the second drivewould be incremented, and a >500 ms timing bucket of the third drivewould be incremented.

In other words, a command that affects multiple drives (e.g., includeschained operations, etc.) results in incrementing a timing bucket foreach of the drives (e.g., each of the drives in the chain, etc.).Further, because the >500 ms timing bucket is the highest applicabletiming bucket, only the >500 ms timing bucket is incremented, andthe >100 ms timing bucket and the >250 ms timing bucket are notincremented.

In some embodiments, the timings of the buckets may be absolute. Forexample, in a system that maintains >100 ms, >250 ms, >500 ms, and >1 stiming buckets for each of the drives of an array, any command thattakes between 100-250 ms to execute may be counted in the >100 ms timingbuckets for each of the drives affected by the command.

In other embodiments, the timings of the buckets may be with respect toa baseline. For example, in a system that maintains >100 ms, >250ms, >500 ms, and >1 s timing buckets for each of the drives of an array,and allows a 50 ms baseline for command execution, then any commandsthat require >100 ms over the baseline (i.e., >150 ms) to execute, maybe counted in the >100 ms timing buckets for each of the drives affectedby the command. Similarly, in the system that maintains >100 ms, >250ms, >500 ms, and >1 s timing buckets for each of the drives of thearray, and allows the 50 ms baseline for command execution, then anycommands that require >250 ms over the baseline (i.e., >300 ms) toexecute, may be counted in the >250 ms timing buckets for each of thedrives affected by the command.

In various embodiments, the bucket data may be read. For example, thebucket data may be read periodically (e.g., every 8 hours, 24 hours, 72hours, etc.). As used herein, the bucket data includes any and alltiming bucket counts for the drives of an array. Further still, thebucket data may be stored or copied to a second location. For example,the bucket data may be stored or copied to a controller file in asystem, a server, the cloud, etc. The bucket data may be stored orcopied to the second location after each periodic reading. Moreover, thebucket data may be stored or copied to the second location for accessand analysis by a client. By periodically offloading the bucket data andstoring elsewhere, the bucket data may be maintained for extendedperiods of time, such as days, weeks, months, etc. The bucket data maybe analyzed by computer logic or an individual to identify one or morelethargic drives in the RAID configuration.

Thresholds may be defined in some embodiments. The thresholds may causeautomatic reporting of a drive that has excessive timing bucket counts.For example, the drive may be reported as a lethargic drive.

In one embodiment, a threshold may be with respect to all timing bucketsfor a drive, or for a single bucket of the drive. For example, athreshold may include a total number of counts in all buckets of thedrive, or a total number of counts in one or more single buckets of thedrive (e.g., the >500 ms bucket, etc.). In some embodiments, thethreshold may be configured such that the higher delay timing buckets(e.g., the >500 ms bucket, >1 s timing bucket, etc.) are more heavilyweighted than the lower delay timing buckets (e.g., >100 ms timingbucket, >250 ms timing bucket, etc.).

In another embodiment, the threshold may be with respect to a totalcumulative delay reflected by the counts in the timing buckets. Forexample, a drive with three >100 ms timing bucket counts, two >250 mstiming bucket counts, and one >500 ms timing bucket count, may beassociated with a total cumulative delay of >1300 ms, which would be inexcess of a 1000 ms cumulative delay threshold.

Still yet, in some embodiments, a threshold may be set with respect toother drives in the array. For example, a threshold may be set withrespect to a number of counts in one or more of the timing buckets of agiven drive with respect to the timing buckets of other drives in thearray. For example, a threshold may be configured to report a drive with10+ more timing bucket counts than any other drive in its RAIDconfiguration. In this manner, the thresholds may be defined to allowautomatic reporting of a drive that has excessive counts compared toother drives in the array.

In a typical RAID configuration, where data is striped across all drivesin the array, and one or more units of each stripe is reserved to holdparity data calculated from the other units in the stripe (e.g., RAID-5,RAID-6, etc.), the parity units are distributed or rotated throughoutthe array. Accordingly, a given update operation (e.g., a write ormodify operation, etc.) may have an equal chance of hitting any of thedrives. However, any update operation also results in an update to theparity unit. Thus, when a parity unit for a stripe is stored on alethargic drive, and data for the stripe is uniformly distributed acrossthe other drives of the array, then an update to any of the data unitsin the stripe will result in an update to the parity unit on thelethargic drive. Accordingly, the lethargic drive storing the parityunit will receive more updates than any other single drive. This mayresult in a great number of delayed commands.

Conversely, when a lethargic drive stores a data unit for a givenstripe, the parity unit for the stripe may be stored on a non-lethargic(i.e., operationally normal) drive. In this way, only operations thatdirectly affect the data (e.g., data reads, data writes, and dataupdates, etc.) on the lethargic drive will be delayed.

Because RAID configurations are generally configured to equallydistribute access over the entirety of the array, all resources may behit in different combinations over an extended period of time. In oneembodiment, where n timing bucket counts accumulate over time against agiven lethargic drive in a RAID configuration, the timing bucket countsaccumulated over time against the other drives of the RAID configurationmay approximate (n/w), where w is a width of the array. In other words,the accumulated timing bucket counts may be distributed uniformly overthe remaining drives of the array.

Accordingly, as timing bucket counts are accumulated in the timingbuckets for the drives of the array, one or more lethargic drives thatare responsible for unexpected delays may be identified.

In some embodiments, automation logic or code may fail the lethargicdrive without human intervention based on a combination of adifferential count, a threshold, and maintaining the differential countover a predefined period of time. For example, the lethargic drive maybe failed when it maintains a timing bucket count that is 10+ greaterthan any other drive in the array for a period of 12 hours. Thepredefined period of time may be set by a user, such as anadministrator.

In the various embodiments discussed herein, a drive that is operatingbelow a standard or expected level of operation may slow the executionof commands on a RAID configuration. Tracking the completion time ofcommands in timing buckets may serve to identify the lethargic drivewithin the array. A drive with excessive counts in its timing bucketsmay be a lethargic drive, where the lethargic drive negatively impactsdata access in some stripes of the RAID configuration, and negativelyimpacts parity updates in others.

Further, a drive with an excessive bucket count may be replaced. In oneembodiment, after replacing a lethargic drive, the timing buckets may bereset for the drives of the array. In another embodiment, afterreplacing a lethargic drive in an array, the timing buckets may not bereset, and may continue to be incremented based on the execution time ofcommands against the array. At a subsequent time, delta-counts may becomputed for the timing buckets of the drives of the array, where thedelta-counts are differences between timing bucket counts at thesubsequent time and the time when the drive was replaced.

The delta-counts may be used to identify the presence of a lethargicdrive within the array in the same manner as described above withrespect to the counts in the timing buckets. The use of delta-counts forthe drives may allow for the identification of a lethargic drive (e.g.,by comparing the delta-counts to thresholds discussed above, etc.) whileavoiding the skewing of a particular drive's timing bucket counts due toinfluence from a removed lethargic drive. For example, if a data blockon a given drive is frequently updated, and a replaced lethargic drivepreviously maintained a parity block for the stripe of the data block,then the given drive may have accumulated a number of timing bucketcounts due to the lethargic drive. When the lethargic drive iseventually replaced, the given drive may cease accumulating counts inits timing buckets. Despite the given drive's high number of timingbuckets counts, a periodic evaluation of delta-counts of the givendrive, and comparison to a threshold, may be utilized to ensure that thegiven drive is not also lethargic. In one particular embodiment, thedelta-counts may be determined and compared to a threshold every 6-10hours.

Referring now to FIG. 4, a diagrammatic layout of commands executingagainst a RAID configuration 400 is shown, in accordance with oneembodiment. As an option, the present RAID configuration 400 may beimplemented in conjunction with features from any other embodimentlisted herein, such as those described with reference to the other FIGS.Of course, however, such RAID configuration 400 and others presentedherein may be used in various applications and/or in permutations whichmay or may not be specifically described in the illustrative embodimentslisted herein. Further, the RAID configuration 400 presented herein maybe used in any desired environment.

As shown in FIG. 4, four drives (disk 0, disk 1, disk 2, and disk 3) ina RAID-5 configuration generally comprise the RAID 400. Further, forpurposes of simplicity, the array 400, is shown within FIG. 4 tocomprise twelve stripes (stripes 0-11) spread across the four drives. Inparticular, each of the stripes is shown to include a parity block, andthree data blocks. For example, stripe 0 is shown to include parityblock P-0 maintained on disk 0, and data blocks D-0, D-1, and D-2,maintained on disk 1, disk 2, and disk 3, respectively. Similarly,stripe 11 is shown to include data blocks D-33, D-34, and D-35,maintained on disk 0, disk 1, and disk 2, respectively; as well asparity block P-11 maintained on disk 3 . Within the RAID configuration400, the parity blocks and data blocks are evenly distributed across thefour drives. In other words, amongst the 12 illustrated stripes, eachdrive stores 9 data blocks and 3 parity blocks.

Additionally, FIG. 4 is shown to illustrate the target blocks of readand/or write operations due to a plurality of commands 401-415. For thepurposes of the present illustration, the commands 401-406 areillustrated to include read operations. Accordingly, as illustratedwithin FIG. 4, the commands 401-406 that include only read operationsare shown to access only data blocks. For example, the command 402 isshown to access data blocks D-12 and D-13, of disk 1 and disk 2,respectively.

As illustrated within FIG. 4, the commands 411-415 are illustrated asincluding write operations. For example, the command 411 may include aread operation of data block D-4 and a write operation to data blockD-5, which results in an update to parity block P-1. Similarly, a writeor update operation to data block D-21 and/or data block D-22 results inan update to the parity block P-7 during execution of command 414.

For purposes of simplicity, the commands 401-415 are illustrated asbeing non-overlapping. In other words, none of the commands 401-415 areshown to read or write a data block or parity block that has previouslybeen read or written by another of the commands 401-415. This is simplyto aid in the instant description, and it is understood that read orwrite access of commands is not intended to be limited in any manner.For example, it is contemplated that a first command may included afirst chain of operations that affect a plurality of data blocks and/orparity blocks, and some subset of those data blocks and/or parity blocksmay be accessed during a subsequent command including a second chain ofoperations.

With continued reference to FIG. 4, execution of each of the commands401-415 against RAID configuration 400 has been characterized withinTable 1.

TABLE 1 COMMAND # DISKS INVOLVED TYPE TIME > ALLOWED 401 1, 2, 3 R >250ms 402 1, 2 R >100 ms 403 2, 3 R — 404 0 R — 405 2, 3 R — 406 0, 2R >100 ms 411 2, 3, (1) R/W >500 ms 412 0, 1, 3, (2), (3) R/W >1 s 4130, (1) R/W — 414 0, 1, (3) R/W — 415 0, 2, (3) R/W >250 ms

As illustrated by Table 1, multiple commands have been executed with acompletion time that is greater than desired. Specifically, the commands401, 402, 406, 411, 412, and 415 have been executed with a completiontime that is greater than allowed. As noted above, each time a commandexecutes with a completion time that is greater than allowed, thecompletion time of the command may be counted in a timing bucket foreach drive that the command is directed to. In other words, thecompletion time of the command may be counted in a timing bucket foreach drive that the command is known to perform I/O operations on.

For example, the command 401 is characterized as a command includingonly read operations that are directed to data blocks of disk 1, disk 2,and disk 3. Moreover, a completion time of the command 401 wasdetermined to count against a >250 ms timing bucket for each of disk 1,disk 2, and disk 3.

For illustrative purposes only, each of the disks 0-3 is associated withits own corresponding set of the following timing buckets: >100 ms, >250ms, >500 ms, and >1 s. As noted above, the disks may be associated withgreater than or fewer than four timing buckets, as well as timingbuckets associated with other timing intervals.

Referring again to FIG. 4 and Table 1, the command 411 is characterizedas a command including read/write operations that are directed to datablocks of disks 2 and 3, as well as a parity block of disk 1. Acompletion time of the command 411 was determined to count againsta >500 ms timing bucket for each of disk 1, disk 2, and disk 3.Similarly, the command 412 is characterized as a command includingread/write operations that are directed to data blocks of disks 0, 1,and 3, as well as parity blocks of disks 2 and 3. Additionally, acompletion time of the command 412 was determined to count against a >1s timing bucket for each of disk 0, disk 1, disk 2, and disk 3.

Table 2 shows an accumulation of counts for the timing buckets for eachof the disks 0-3 of FIG. 4. Table 2 also shows a cumulative time foreach of the disks 0-3 based upon the counts of the timing buckets.

TABLE 2 CUMULATIVE DISK # >100 ms >250 ms >500 ms >1 s TIME DISK 0 1 11 >1350 ms DISK 1 1 1 1 1 >1850 ms DISK 2 2 2 1 1 >2200 ms DISK 3 2 11 >2000 ms

As shown in Table 2, disk 2 has an accumulated count of 2 in the >100 mstiming bucket, an accumulated count of 2 in the >250 ms timing bucket,an accumulated count of 1 in the >500 ms timing bucket, and anaccumulated count of 1 in the >1 s timing bucket. Further, disk 2, basedon the accumulated counts in the timing buckets, is associated with acumulative time delay of >2200 ms. The accumulated counts and cumulativetime of disk 2 are greater than the accumulated counts and cumulativetime for the other disks 0, 1, and 3.

Accordingly, based on the timing bucket counts, it may be determinedthat disk 2 is a lethargic drive that is negatively impacting theperformance of the array 400. Further, disk 2 may be identified as thelethargic drive in the array 400 despite the fact that all commandsdirected to disk 2 included chain operations directed to disks otherthan disk 2. Accordingly, using the methods and systems described above,a lethargic drive within a RAID configuration may be identified evenwithout the ability to time operations directed to individual drives.

Still yet, it should be noted that the command 405 is characterized as acommand including read operations that are directed to data blocks ofdisk 2 and disk 3. Moreover, a completion time of the command 405 wasdetermined to not count against any timing buckets for disk 2 or disk 3.Accordingly, although disk 2 has been characterized as a lethargic drivein the array 400, it is understood not all I/O operations on disk 2 maybe affected by its status. In other words, not all I/O operations on alethargic drive may be sufficiently delayed to increase a count of oneor more timing buckets.

Disk 2 may be identified as a lethargic drive by a user that analyzesthe timing bucket data of the disks of the array 400, or by automatedlogic that analyzes the timing bucket data of the disks of the array400. In response to identifying disk 2 as a lethargic drive, disk 2 maybe replaced by another disk. Moreover, continued monitoring of the array400 may be employed to determine whether the accumulated counts in thetiming buckets of disk 3 also indicate disk 3 as being a lethargic disk,or instead indicate that the accumulated counts in the timing buckets ofdisk 3 were being skewed by commands including chained operations thatwere directed to both lethargic disk 2 and disk 3. This may be doneusing some combination of delta-counts and thresholds, as discussedabove.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

Moreover, a system according to various embodiments may include aprocessor and logic integrated with and/or executable by the processor,the logic being configured to perform one or more of the process stepsrecited herein. By integrated with, what is meant is that the processorhas logic embedded therewith as hardware logic, such as an applicationspecific integrated circuit (ASIC), a FPGA, etc. By executable by theprocessor, what is meant is that the logic is hardware logic; softwarelogic such as firmware, part of an operating system, part of anapplication program; etc., or some combination of hardware and softwarelogic that is accessible by the processor and configured to cause theprocessor to perform some functionality upon execution by the processor.Software logic may be stored on local and/or remote memory of any memorytype, as known in the art. Any processor known in the art may be used,such as a software processor module and/or a hardware processor such asan ASIC, a FPGA, a central processing unit (CPU), an integrated circuit(IC), a graphics processing unit (GPU), etc.

It will be clear that the various features of the foregoing systemsand/or methodologies may be combined in any way, creating a plurality ofcombinations from the descriptions presented above.

It will be further appreciated that embodiments of the present inventionmay be provided in the form of a service deployed on behalf of acustomer to offer service on demand.

While various embodiments have been described above, it should beunderstood that they have been presented by way of example only, and notlimitation. Thus, the breadth and scope of a preferred embodiment shouldnot be limited by any of the above-described exemplary embodiments, butshould be defined only in accordance with the following claims and theirequivalents.

What is claimed is:
 1. A method, comprising: for each of a plurality ofdrives in a redundant array of independent disks (RAID) configuration,identifying a plurality of timing buckets unique to the drive, where:each of the plurality of timing buckets is associated with a differenttime interval, and each of the plurality of timing buckets includes acounter, the counter indicating a number of times a completion time of aprior command directed to the drive matched the time interval associatedwith the timing bucket, the completion time including a time tocompletion of an execution of the command; for each of the plurality ofdrives, comparing the counter of one or more of the plurality of timingbuckets unique to the drive to a predetermined threshold; andidentifying a lethargic drive within the plurality of drives, based onthe comparing.
 2. The method of claim 1, wherein, for each of theplurality of drives, the predetermined threshold includes a total numberof counts in all of the plurality of timing buckets unique to the drive.3. The method of claim 1, wherein, for each of the plurality of drives,the predetermined threshold includes a total number of counts in one ofthe plurality of timing buckets unique to the drive.
 4. The method ofclaim 1, wherein, for each of the plurality of drives, the drive isidentified as the lethargic drive when the counter of one or more of theplurality of timing buckets unique to the drive exceeds thepredetermined threshold.
 5. The method of claim 1, wherein the comparingis performed on a predetermined periodic basis.
 6. The method of claims1, further comprising reporting the lethargic drive.
 7. The method ofclaim 1, wherein the predetermined threshold is set with respect toother drives in the RAID configuration.
 8. The method of claim 1,wherein the lethargic drive is identified when the total number ofcounts in all the timing buckets of the lethargic drive is greater thana total number of counts in all of the remaining plurality of drives. 9.The method of claim 1, wherein the lethargic drive is failed based on acombination of a differential count, the predetermined threshold, andmaintaining the differential count over a predefined period of time. 10.The method of claim 9, wherein the predefined period of time is set byan administrator.
 11. A system, comprising: a processor; and logicintegrated with the processor, executable by the processor, orintegrated with and executable by the processor, the logic beingconfigured to: for each of a plurality of drives in a redundant array ofindependent disks (RAID) configuration, identify a plurality of timingbuckets unique to the drive, where: each of the plurality of timingbuckets is associated with a different time interval, and each of theplurality of timing buckets includes a counter, the counter indicating anumber of times a completion time of a prior command directed to thedrive matched the time interval associated with the timing bucket, thecompletion time including a time to completion of an execution of thecommand; for each of the plurality of drives, compare the counter of oneor more of the plurality of timing buckets unique to the drive to apredetermined threshold; and identify a lethargic drive within theplurality of drives, based on the comparing.
 12. A computer programproduct for identifying a lethargic drive, the computer program productcomprising a computer readable storage medium having programinstructions embodied therewith, the program instructions executable bya processor to cause the processor to: identify, for each of a pluralityof drives in a redundant array of independent disks (RAID)configuration, a plurality of timing buckets unique to the drive,utilizing the processor, where: each of the plurality of timing bucketsis associated with a different time interval, and each of the pluralityof timing buckets includes a counter, the counter indicating a number oftimes a completion time of a prior command directed to the drive matchedthe time interval associated with the timing bucket, the completion timeincluding a time to completion of an execution of the command; compare,for each of the plurality of drives, the counter of one or more of theplurality of timing buckets unique to the drive to a predeterminedthreshold, utilizing the processor; and identify, utilizing theprocessor, a lethargic drive within the plurality of drives, based onthe comparing.
 13. The computer program product of claim 12, wherein,for each of the plurality of drives, the predetermined thresholdincludes a total number of counts in all of the plurality of timingbuckets unique to the drive.
 14. The computer program product of claim12, wherein, for each of the plurality of drives, the predeterminedthreshold includes a total number of counts in one of the plurality oftiming buckets unique to the drive.
 15. The computer program product ofclaim 12, wherein, for each of the plurality of drives, the drive isidentified as the lethargic drive when the counter of one or more of theplurality of timing buckets unique to the drive exceeds thepredetermined threshold.
 16. The computer program product of claim 12,wherein the comparing is performed on a predetermined periodic basis.17. The computer program product of claim 12, further comprisingreporting the lethargic drive.
 18. The computer program product of claim12, wherein the predetermined threshold is set with respect to otherdrives in the RAID configuration.
 19. The computer program product ofclaim 12, wherein the lethargic drive is identified when the totalnumber of counts in all the timing buckets of the lethargic drive isgreater than a total number of counts in all of the remaining pluralityof drives.
 20. The computer program product of claim 12, wherein thelethargic drive is failed based on a combination of a differentialcount, the predetermined threshold, and maintaining the differentialcount over a predefined period of time.